Cost Impact Analysis of Team Data
#Global get comma sep csv and excel data read function
get_data <- function(filepath){
#stop execution once the file is a non-character
stopifnot("`filename` must be a character." = is.character(filepath))
#Check if file is not an excel or csv
if(!is.null(str_subset(filepath, pattern = ".xls{1}$|.xlsx{1}$|.csv{1}$", negate = TRUE)) &&
length(str_subset(filepath, pattern = ".xls{1}$|.xlsx{1}$|.csv{1}$", negate = TRUE)) == 1 &&
!is.na(str_subset(filepath, pattern = ".xls{1}$|.xlsx{1}$|.csv{1}$", negate = TRUE))
) {
stop("not excel or csv file")
}
#Execute the function body
if (isTRUE(stringr::str_extract(filepath, pattern = ".xls{1}$") == ".xls")) {
result <- readxl::read_xls(filepath) |>
janitor::clean_names() |>
dplyr::mutate_at(dplyr::vars(contains("date")),
~as.Date(., format = "%m/%d/%Y"))
} else if(isTRUE(stringr::str_extract(filepath, pattern = ".xlsx{1}$") == ".xlsx")) {
result <- readxl::read_xlsx(filepath) |>
janitor::clean_names() |>
dplyr::mutate_at(dplyr::vars(contains("date")),
~as.Date(., format = "%m/%d/%Y"))
} else {
result <- readr::read_csv(filepath) |>
janitor::clean_names() |>
dplyr::mutate_at(dplyr::vars(contains("date")),
~as.Date(., format = "%m/%d/%Y"))
}
return(result)
}# return on investment function
roi <- function(tot_savings,overall_costs){
round(((tot_savings - overall_costs)/overall_costs) * 100,2)
}1 Objectives
To Provide an overview of:
- Overall cost (only large items such as feld-m, fivetran, etc.)
- Cost/impact analysis of these tools
- Quantified impact of the three teams
- Quantified impact of an additional data scientist
2 Fivetran
Fivetran is an off-the-shelf tool used primarily for Data ingestion of more than 60% of our existing data sources. Fivetran has 5 pricing plans. We are currently subscribed to the Starter plan
We use fivetran to extract data from 20 source connectors of 7 types/categories as shown below.
connectors <- tbl(con,
sql("
select official_connector_name,type, availability
from fivetran_log.connector_type
")) %>% as_tibble() %>%
#lower case + exclude
filter(!tolower(official_connector_name) %in% c("mailchimp", "freshdesk"))
connectors %>% datatable(
extensions = 'Buttons',
options = list(
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
lengthMenu = list(c(10, 25, 50, -1),
c(10, 25, 50, "All"))
)
)
2.1 Cost impact analysis
For proper evaluation of cost and impact analysis of fivetran, i have formulated the questions below to guide the analysis:
What are the pros and cons of fivetran in our current and future infra setup?
What is the overall amount of budget (yearly contract value) and monthly spend per year?
How does the monthly (actual) spend change over months for each year and what is the predicted cost for the upcoming months?
What is our budget estimate for fivetran in Year 2023?
What is the cost of replacing fivetran?
2.2 Pros of Fivetran
“What are the pros and cons of fivetran in our current and future infra setup?”
Self- Service Platform - Anyone in the team can setup and modify data sources. Less dependence on engineers and that’s why we didn’t have them from the start.
Ease of setup - New ingestion pipelines can be setup in less than 7mins.
Multiple data sources - There is a growing list of generally available and stable connectors.
Support & Maintenance - Maintenance is outsourced and support is based on SLA.
Stability - No major outage in the last 2.5 years of use and APIs are very mature and stable
No data loss - Even if data is lost or corrupted on our destination databases, we can always re-ingest/resync the original source data
Guaranteed Availability - We can even exceed planned consumption and still enjoy the services while you plan for a renewal.
2.3 Cons of Fivetran
Consumption-based pricing - This kind of pricing model can be tricky and moderately expensive.
Hard to estimate costs based on consumption - Increased sales will lead to increased monthly active rows. In 2021 we used up our annual budget 1 - 2 months month less than planned.
Not possible to define specific timeframes for data load - Not currently possible to reload/resync data based on specific custom time frames. You may have to reload the entire data from inception which can be a time-waster.
Slightly more expensive if you require more features beyond the basic (starter) plan - On a higher plan, it recomputes the data ingestion based on a higher pricing plan rate even if you only wanted one additional feature.
no refund /carry over of excess credits into new financial year - Fivetran implemented a new policy in late 2021 where customers will not be refunded for excess credits purchased and there will also be no carry over into a new financial year. This definitely means the estimation must be really precise to avoid financial loss.
What is the overall amount of budget (yearly contract value) and monthly spend per year?
The table below shows a breakdown of year-over-year spend on fivetran software.
ft_year <- fivetran_estm %>%
#dplyr::select_all(tolower) %>%
mutate(year = year(date)) %>%
group_by(year) %>%
summarise(spend = sum(actual))%>% as_tibble()%>%
rename("spend (€)" = "spend")
kbl(ft_year) %>%
#kable_styling(bootstrap_options = "striped", full_width = F, position = "float_right")
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")| year | spend (€) |
|---|---|
| 2020 | 6220.37 |
| 2021 | 21746.34 |
| 2022 | 47929.72 |
Note: Spend (€) shown for 2022 is as at Sept 2022.
2.4 Actual Spend vs Predicted
“How does the monthly (actual) spend change over months for each year and what is the predicted cost for the upcoming years?”
Yearly Contract Value over the years has tripled, monthly spend has quadruplued largely due to growing volumes of new data sources, product portfolio and market expansions.
YCV 2022 still provides relative good spend coverage for upcoming growth plans. For 2023, we will anticipate at least 2.5X current YCV.
ggplot(fivetran_estm, aes(x=date)) +
geom_line(aes(y = budget), color = "gray", linetype="twodash", size = 1.45) +
geom_line(aes(y = actual), color="#FF5E0D", size = 1.7) +
geom_line(data = fivetran_estm |> filter(date >= "2022-03-01"),
aes(y = arima_model), color ="#370A5A", size = 1.45,linetype="twodash", alpha = 0.5) +
labs(
y = "Spend (€)",
x = "",
#title = " <b style= 'color:#545454;;'>Fivetran: Budget vs Expense</b>",
title = " <b style= 'color:#545454;;'>Fivetran Starter Plan: Yearly Contract Value vs Spend</b>",
subtitle = "<br><b style= 'color:#792DC5;'>YCV over the years has trippled, </b><b style= 'color:#FF5E0D;'>monthly spend has quadrupled largely due to growing vol of new data sources and market expansions.</b><br>
<b style= 'color:#6a6d6f;;'>YCV 2022 still provides relative spend coverage for upcoming growth plans, for 2023, we will anticipate at least 2X current YCV.</b><br><br>"
# caption = paste("Data accessibility by Etoma Egot",
# "Data source: {qualtrics}",
# sep = "\n")
# added a vertical line to indicate beginning of the dollar pricing model
) +
geom_vline(data = fivetran_estm,
mapping = aes(xintercept= as.Date("2022-02-01")),
size = 1, colour = "black", linetype = "dashed", alpha = 0.5) +
#adding pricing model
annotate(
geom = "text",
x = as.Date("2021-10-01"),
y = 11000, hjust = 0, vjust = 0.5,
label = "credit-based\npricing model",
family = "Roboto",
size = 5.0
)+
annotate(
geom = "text",
x = as.Date("2022-03-01"),
y = 11000, hjust = 0, vjust = 0.5,
label = "dollar-based\npricing model",
family = "Roboto",
size = 5.0
#colour = '#5F249F'
)+
#setting a limit and adding € to the y-axis values
scale_y_continuous(
breaks = seq(2000, 12000, 2000),
limits = c(700, 12000),
label = function(x) {
return(paste(x, "€"))
}
)+
# adding yearly contract value to the plot
geom_point(data = df_date,
mapping = aes(x = mdate, y =tcv_label,
size =15, alpha = 0.5,
colour = year),
) +
#scale the size of the circles
scale_size_area(max_size =11, guide = FALSE) +
#manually specify the circle colours on the plot
scale_color_manual(values = c("#792DC5","#792DC5","#792DC5","#5b5e5f"))+
#labelling the total spend per year for the bubbles
geom_text(
aes(x = mdate, y= tcv_label_ext, colour = year,
label = paste(paste("YCV",year),tcv, sep = "\n")),
data = df_date,
size = 3.5,
vjust = 0.5,
hjust = 0.25,
nudge_y = -0.30,
nudge_x = -0.50
)+
#annotating the Year 2023 planned spend
annotate(
geom = "text",
x = as.Date("2020-12-31"),
y = 10500, hjust = 0.3, vjust = 0.5,
label = "For 2023, we are projecting\nmore than 2X the YCV 2022",
family = "Roboto",
size = 4.0,
color = "#5b5e5f"
) +
#annotating the arima model line
annotate(
geom = "text",
x = as.Date("2022-10-01"),
y = 6700, hjust = 0, vjust = 0.5,
label = "Arima Time Series\nmodel predictions",
family = "Roboto",
size = 4.0,
color = "#370A5A"
) +
####legends - Budget
geom_text(
data = fivetran_events |>
filter(date == "2020-08-07" & events == "Budget"),
mapping = aes(x = date, y = budget, label = events),
size = 3.5, vjust = 0.5, hjust = 0, colour = "#979c9e") +
####legends - Spend
geom_text(
data = fivetran_events |>
filter(date == "2020-08-07" & events == "Spend"),
mapping = aes(x = date, y = budget, label = events),
size = 3.5, vjust = 0.5, hjust = 0, colour = "#FF5E0D") +
#
# ####all events that are live
geom_text(
data = fivetran_events |>
filter(!events %in% c("Steel Bottle Launch","IE Launch","DK/CZ Launch","Black Friday22","Budget","Spend")) ,
mapping = aes(x = date, y = budget, label = events),
size = 3.5, vjust = 0.5, hjust = 0, colour = "#cc4931") +
# ####all events excepts those that are live
geom_text(
data = fivetran_events |>
filter(events %in% c("Black Friday22")) ,
mapping = aes(x = date, y = budget, label = events),
size = 3.5, vjust = 0.5, hjust = 0, colour = "#979c9e") +
scale_x_date(date_breaks = "4 month",
date_labels = "%b %Y",
limits = c(as.Date("2020-08-01"), NA)) +
theme_WD() +
theme (legend.position = "none",
plot.title.position = "plot",
plot.margin = margin(t = 15, r = 15, l=5),
plot.title = ggtext::element_markdown(),
plot.subtitle = element_markdown(lineheight = 1.2, size = 14),
panel.grid.major.y = ggplot2::element_blank(),
#plot.caption = ggtext::element_markdown(),
axis.title.x = ggtext::element_markdown(size = 3),
axis.title.y = ggtext::element_markdown(size = 12, hjust = 0.9),
strip.text = element_text(size = 14, color = "#A8A7A7"),
strip.background = element_rect(color = "#474747", fill = "#474747"),
#axis.text = element_text(family = "Roboto", color = "#161616", size=15),
axis.text = element_text(family = "Roboto", color = "#474747", size=11)
)
What is the budget estimate for fivetran in Year 2023?
With plans to launch six new markets and also grow our product portfolio. We are projecting 2.5X the yearly contract value of 2022. Ballpark figure of around 120K € under the current starter plan.
If we will use fivetran for more real-time data refresh e.g. 15mins refresh time, then we look 3X YCV of 2022. Shorter refresh times with fivetran will require an upgrade to fivetran’s standard plan. For more information, see pricing plans.
What is the cost of replacing fivetran with internal developments?
Realistically, we will be talking about hiring at least 10 data engineers who will have to replicate the source connectors with internally developed APIs. While these saves cost after a one-time development, it also creates a maintenance bottleneck and a constant need to be up-to-date to upgrade APIs whenever there are changes/upgrades from the source APIs.
Perhaps, a better alternative with fewer data engineers will be to introduce a parallel, competitive and less expensive data ingestion tool. The closest competitor is airbyte (open source with community contributors) but their connectors are not very stable yet. They are definitely challengers worth considering for the future.
3 FELD-M
feld-m is an external consultancy agency that provides services on different business areas. feld-m operates a time & material contract with an hourly rate of 137.5 €/hr. They provide(d) services to air up that span different areas such as:
- Business Intellgence
- Web Analytics
- Data Engineering
- Data Science
- Project Manager & Stand ups
For proper evaluation of cost and impact analysis of feld-m, i have formulated the questions below to guide the analysis:
What is the overall amount of budget and hours used and the remaining amounts for each?
How does the actual cost change over each quarter of each year? and what is the predicted cost for the following 6 months?
What are our future plans with FELD-M?
3.1 Overall Budget Spent
What is the overall amount of budget spent on the different service areas with FELD-M?
feld_m %>%
group_by(general_topic,year) %>%
summarise(across(contains(c("used_budget","duration")), list(total = sum),na.rm = TRUE, .names = "{col}.{fn}")) %>%
arrange(desc(year)) %>%
mutate_at(vars(contains("used_budget_percent.total")), ~round(used_budget_percent.total,2)*100) %>%
rename(share = "used_budget_percent.total","used_budget(€)" = "used_budget.total",
"duration(hr)" = "duration.total") %>%
ungroup(general_topic) %>%
arrange(desc(share)) %>%
arrange(desc(year)) %>%
mutate_at(vars(contains("share")), ~paste(.,"%")) %>%
dplyr::relocate(-share) %>% datatable(
extensions = 'Buttons',
options = list(
dom = 'Blfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf'),
lengthMenu = list(c(10, 25, 50, -1),
c(10, 25, 50, "All"))
)
)
Note: used budget (€) shown for 2022 is as at Sept 2022.
This is best illustrated visually using the slope chart below to see the trend by service area. Note: Click on the chart visuals so they pop out
For the last 3 quarters, (Q4 2021, Q1 2022, Q3 2022), Business Intelligence has been on the rise due to high demand especially triggered by capacity needs of the data democracy project which commenced in Q4 2021 and the increased meeting from sprint stand ups.
##---------------Prepping Slope Chart-------------
#Slope Graph
ggplot(data = feldm_slope, aes(x = year, y = used_budget.total,
group = general_topic, color=general_topic)) +
geom_line(size = 2) +
geom_point(size = 5) +
labs(
y = "Spend (€)",
x = "",
title = " <b style= 'color:#545454;;'>FELD-M: YOY Used Budget by Functional Area</b>",
subtitle = "<br><b style= 'color:#535256;'>Business Intelligence increase was due to data democracy project capacity demands (+2 internal analysts), data engineering increase was due to<br> increase in demand for custom API development, data science increase was due to capacity demands for CLV Sprints (+1 data scientist),</b><br><b style= 'color:#792DC5;'>The decrease observed for PM/StandUps was due to reduction in meeting frequency.</b><br>",
caption = paste("feld-m budget usage by Etoma Egot",
"Data source: {Airup X FeldM Project Accounting Dashboard}",
sep = "\n")
) +
scale_x_discrete(position = "top") +
theme_WD()+
## Changing font size of x-axis and main title of the graph
theme(
#plot.title = element_text(color="black", size=15, face="bold", hjust = 0.5 ),
axis.text.x = element_text(vjust = 0.5, hjust=0.5, size = 22, face="bold"),
## Removing grid lines
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
##Add markdown elements for HTML
plot.margin = margin(t = 15, r = 15, l=5),
plot.title = ggtext::element_markdown(),
plot.subtitle = element_markdown(lineheight = 1.2, size = 14),
plot.caption = ggtext::element_markdown(),
## Removing the legend
legend.position = "none",
## Removing everything from the y axis
axis.title.y = element_blank(),
axis.text.y = element_blank(),
## Removing title from the x axis
axis.title.x = element_blank(),
## Remove x & y tick marks
axis.ticks = element_blank()) +
##Add labels to the slope chart
geom_text(data = feldm_slope %>% filter(year == "2021"),
#aes(label = paste0(general_topic, " - ", round(used_budget_abb.total,0), "K")),
aes(label = paste0(" ",round(used_budget_abb.total,0), "K")),
hjust = 1.1,
#fontface = "bold",
vjust=0.25,
size = 6.5
) +
geom_text(data = feldm_slope %>% filter(year == "2022"),
aes(label = paste0( round(used_budget_abb.total,0), "K", " ",general_topic)) ,
hjust = -0.1,
fontface = "bold",
vjust = 0.25,
size = 6.5
) +
scale_color_manual(values = c("#B5B1B9","#B5B1B9","#B5B1B9","#792DC5","#B5B1B9"))Figure 3.1: …
3.2 Quarterly Trends
What is the trend of budget consumed over each quarter of each year?
Let’s look at the budget consumption per service area by quarter since our contract with feld-m is based on quarterly renewals.
##--------Small multiples view of the budget distribution by function-----------
## Small multiples visualisation
feldm_small |>
mutate_at(vars(contains("general_topic")),~factor(.,
levels = c("BI",
"Data Engineering",
"Web Analytics",
"Data Science",
"PM & Standups"))) |>
#group_by(general_topic) |>
ggplot(aes(x=quarter, y = used_budget.total, fill = general_topic)) +
geom_col(position = "dodge",binwidth = 0.5, color = "#ffffff", ##percent stacked barchart
stat = "identity")+
scale_fill_manual(values = c("#bfc2c2","#bfc2c2","#bfc2c2","#bfc2c2","#792DC5"))+
geom_text(data = feldm_small,
aes(label = paste0(round(used_budget_abb.total,0), "K")) ,
#aes(label = paste0(used_budget_abb.total),"K"),
#aes(label = used_budget_abb.total),
hjust = 1.5,
#fontface = "bold",
vjust=-0.25,
size = 6.5,
nudge_y = -0.55,
nudge_x = 0.1
)+
facet_wrap(~general_topic)+
labs(
y = "Spend (€)",
x = "",
title ="How does the actual cost change over each quarter of each year?",
# title = " <b style= 'color:#545454;;'>FELD-M: YOY Used Budget by Functional Area</b>",
subtitle = "<br><b style= 'color:#535256;'>Costs reduced significantly in Q3 2022 except in web analytics,</b><b style= 'color:#792DC5;'> only the decline for PM/StandUps remained consistent through 2022.</b>")+
theme_WD()+
theme(legend.position = "none",
axis.title.y = element_blank(),
axis.text.y = element_blank(),
plot.margin = margin(t = 15, r = 15, l=5),
plot.title = ggtext::element_markdown(),
plot.subtitle = element_markdown(lineheight = 1.2, size = 14),
plot.caption = ggtext::element_markdown(),
## Remove x & y tick marks
axis.ticks = element_blank())Figure 3.2: …
What are our future plans with FELD-M?
Following the success of scaling our team between 2021 and 2022, i believe we have really strong talents in the team with the capacity and mindset to deliver as a standalone team. We have resorted to first reset our quarterly commitments with FELD-M to 1/4 of Q1 2022 (25K per quarter) and this will gradually evaporate to zero expenditure in mid-2023 or EOY 2023.
For more information. Please check out our accounting dashboard
4 Team Data
4.1 Context
While there are no absolute methods or formulae for measuring ROI for data teams. We have formulated a measurement based on established best practices for measurement of ROI. Nonetheless, we applied it in the context of our specific scenario as data team of air up.
We currently do not sell our data products neither do we charge stakeholders directly for services rendered, doing this would have made it easier to quantify impact or directly measure our contributions to revenue. It’s often easier for the stakeholders/consumers to tell the ROI story.
Data teams operate in a very peculiar way. We act in support of other teams to impact business performance, increase profits and efficiency as well as enhance the quality of business decisions based on accurate information and insights from business intelligence deliverables.
A significant value the data team offers which is difficult to quantify is the ability to centralize, transform and aggregate data in a single source of truth and programmatically slice and dice or synthesize insights from different and mostly incompatible sources into one global data model which is then visualized on a central BI platform for users to self-serve e.g. tableau accessible to every licensed user.
Hence the premise of ROI computation for a data team is most feasible when benefits is measured based on direct savings made from eliminating redundant/recurring work, time saved from acessing different platforms (e.g 10 webshops) and ultimately generating value in near-real time.
4.2 Services Offered
We provide the following services:
- Reports/Dashboards
- Data Models
- Adhoc Analysis (One-off)
- Consultancy (Provide Clarifications, trainings, program abstract ideas into reality, intellectual powerhouse)
- Advanced Modelling (Predictions/ forecast with AI, ML, Deep Learning, hypothesis validation etc.)
- GA/GTM Tracking (Support Market Expansions, Geolocation tracking etc)
- Data Integration (Build and maintain custom data source consumers)
- Data Management (Governance, Maintenance, Monitoring etc.)
4.3 Prioritization
The journey of quantifying impact starts with ensuring a proper prioritization and task impact assessment is carried out. We usually demand the following information from our stakeholders: As a direct result of this exercise, we have also changed our request forms to mandate the users to provide answers to 2 of 3 questions.
Link to company or team goals
Which company goal does it support and please explain how it supports the company goal?
Business Impact
What type of business decision would you make with each chart?
Time savings
How many hours per day/week do you spend getting these information through other mediums? or working in excel?
4.4 Impact vs Operationalization Matrix
We have summarized our value quantification process using the below impact vs operationalization matrix. As a team, our aim is to prioritize tasks that aim for immediate impact with recurring long term value.
knitr::include_graphics(here::here("images/prioritization_matrix.png"))Figure 4.1: …
#knitr::include_graphics('images/prioritization_matrix.png') -works but not on this machine, use the top one
4.5 Quantifying Impact of Teams
Let’s start with the definition of Return on Investment (ROI)
ROI = (Net Income from investment / cost of investment) X 100
4.6 Assumptions
- Used 220 working days in a year (excludes weekends and vacations)
- Hourly rate differs by role seniority. It increases,the more senior the role..
- 1hr per day of work saved per first level employee.
- 1.5hr minimum saved per mid level to exec because time is more precious/expensive as you go higher up. - - Time saved was positively correlated with hourly rates.
- personnel costs computed excludes bonus, training, team dinners, reimbursements etc.
- software costs doesn’t include cloud infrastructure license for redshift infrastructure.
- Based on 276 employees (currently 227 registered tableau users)
- Actual costs was used for personnel, external and software (no assumptions here)
- All costs were based on total costs expended as at sept 2022
4.7 Computing investment Cost
The below tables contain details of cost of investment in the following areas respectively:
- Software costs
- External costs
- Personnel costs
path <- here::here("datasets/roi_breakdown.xlsx")
crazy_sheet <- lapply(excel_sheets(path), read_excel, path = path)
crazy_sheet[[1]] %>%
mutate_at(vars(contains(c("yr_2022_09","yr_2022"))),~as.numeric(.)) %>%
kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")| Software | yr_2020 | yr_2021 | yr_2022_09 | yr_2022_forecast |
|---|---|---|---|---|
| Fivetran | 6220 | 21746 | 47930 | 58523.00 |
| DBT | 0 | 1432 | 3747 | 6500.00 |
| Metaplane | 0 | 720 | 1757 | 2377.61 |
| Tableau | 0 | 29090 | 82000 | 90000.00 |
| GA360 | 0 | 0 | 8032 | 24097.00 |
| Usercentrics | 0 | 4070 | 8961 | 8961.00 |
| Klipfolio | 1704 | 852 | 0 | 0.00 |
| Dbeaver | 0 | 0 | 0 | 384.00 |
crazy_sheet[[2]] %>%
kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")| year | nos_of_requests | no_of_employees | team_age | Internal Costs | external_costs | software_costs | overall_costs |
|---|---|---|---|---|---|---|---|
| 2020 | 40 | 2 | 0.5 | 62500 | 0 | 9944 | 72444 |
| 2021 | 240 | 5 | 1 | 169000 | 223650 | 59931 | 452581 |
| 2022 | 185 | 11 | 1.7 | 506250 | 373924 | 152427 | 1032601 |
4.8 Computing direct benefits (time saved)
Assuming 10 Execs/VPs, 50 Mid-level and 216 First level teamies at different hourly rates (150 €/h, 55€/h and 42€ per hr). Example, 10 Exec/VPs working 220 working days with minimum of 1.5hr per day of work saved.
Estimated Savings = 10 X 220 X 1.0 X 150 = €330000
path <- here::here("datasets/roi_breakdown.xlsx")
crazy_sheet <- lapply(excel_sheets(path), read_excel, path = path)
crazy_sheet[[3]] %>%
kbl() %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")| decision_makers | Nr_of_employees | example | hourly_rate | hours_saved_per_yr_excl_weekends | savings_per_yr_all_ employees |
|---|---|---|---|---|---|
| Top Level | 10 | CEO, COS, CMO, VP | 150 | 3300 | 330000 |
| Mid Level | 50 | Head Ofs | 55 | 16500 | 907500 |
| First Level | 216 | Team Leads, SMEs, Teamies | 42 | 47520 | 1995840 |
4.9 Compute ROI
Recall our ROI formula,
ROI = (Net Income from investment / cost of investment) X 100
OR
ROI = (Total Estimated Savings - Investments)/Investments
overall_costs <- crazy_sheet[[2]] %>%
filter(year == max(year)) |> pull(overall_costs)
tot_savings <- crazy_sheet[[3]] |>
janitor::clean_names() |>
summarise(total = sum(savings_per_yr_all_employees)) |> pull(total)
#Predicted Software Cost for 2022
software_total <- crazy_sheet[[1]] |>
janitor::clean_names() |>
mutate_at(vars(contains("yr_2022_forecast")), ~as.numeric(.)) |>
summarise(total = sum(yr_2022_forecast)) |> pull(total)
#Compute ROI
ROI <- roi(tot_savings,overall_costs)Hence, ROI as at Sept 2022 is 213.13%.
Predicted software costs for 2022 is €190842.61
4.10 Quantifying Impact of an additional data scientist
The value quantification in current scope is at team level and therefore not applicable to individuals. Also, as agreed, we will not need to hire this role anymore in 2023.
4.11 Conclusions on ROI
We ended up with a positive ROI in 2nd Year only (assuming -100% ROI for the 1st 6 - 9 months) based on tangibles like time savings. If we factor in the intangible elements then realistically, the overall ROI may be in the region of 3-5X (X = 213.13%). Since these assets do not depreciate with increasing employees (as recently seen with the US launch, they re-used existing reports/dashboards) or increased usage then the ROI will not only stay positive but increase in value for Year 3 and beyond as long as our business model remains the same.
There are very many specific use cases of significant ROI generated for specific teams who spent 3-6 hrs daily on redundant work which has now been eliminated after integration of data source requirements and subsequently creating automated self-serve reports on tableau.
With anticipated reduction/elimination of external costs in 2023, the ROI will continue its linear growth.